AITopics

Genre: Research Report > Promising Solution (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Garg, Spandan, Moghaddam, Roshanak Zilouchian, Sundaresan, Neel

PerfBench: Can Agents Resolve Real-World Performance Bugs?

arXiv.org Artificial IntelligenceDec-4-2025

Performance bugs are inefficiencies in software that waste computational resources without causing functional failures, making them particularly challenging to detect and fix. While recent advances in Software Engineering agents have shown promise in automated bug fixing, existing benchmarks primarily focus on functional correctness and fail to evaluate agents' abilities to identify and resolve non-functional issues like performance bugs. We introduce PerfBench, a benchmark comprising 81 real-world performance bug-fixing tasks from popular .NET repositories on GitHub. Unlike existing benchmarks that rely on pre-existing test suites, PerfBench features a novel evaluation harness that allows agents to generate their own performance benchmarks and validates fixes by comparing execution metrics collected for developer fix and agent fix. Each task in PerfBench is derived from actual developer fixes linked to performance-related issues, which are then verified by human experts, ensuring real-world relevance. Our evaluation reveals that current state-of-the-art coding agents struggle with performance optimization tasks, with baseline OpenHands agent achieving only a ~3% success rate on our benchmark. We develop OpenHands-Perf-Agent, which incorporates performance-aware tooling and instructions and achieves a ~20% success rate on the benchmark. We show that by ensuring the agent has proper instructions to benchmark its changes and tooling for benchmark output processing, we can improve the agent performance significantly, but room for improvement still remains. PerfBench provides a challenging test set for furthering the capabilities of agents in fixing performance issues.

large language model, machine learning, natural language, (20 more...)

2509.24091

Country:

Europe (0.68)
North America > United States > Washington > King County (0.14)

Genre: Research Report (0.51)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Neural Information Processing SystemsOct-3-2025, 07:47:33 GMT

9d1827dc5f75b9d65d80e25eb862e676-AuthorFeedback.pdf

artificial intelligence, data mining, machine learning, (16 more...)

Technology:

Information Technology > Data Science > Data Mining (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Neural Information Processing SystemsMay-27-2025, 13:54:34 GMT

A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs.

diagnosing software performance regression, regression testing, zero-positive learning approach, (3 more...)

Genre: Research Report > Promising Solution (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-11-2025, 22:47:57 GMT

Reviews: A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

STRONG POINTS/CONTRIBUTIONS 1) The false positive rates and false negative rates observed when using AutoPerf are impressively low. NEGATIVE POINTS 1) The paper lacks a lot of technical depth and novelty… autoencoders for anomaly detection are widely used, and the problem domain (detecting performance bugs) has been studied previously as well. Knowing what was changed in the code between P_i and P_i 1 could be very, very helpful. DETAILED COMMENTS One comment is that I'm not sure it makes a lot of sense to train separate autoencoders for each function (or group of functions, if you are doing the k-means thing). Likely, there are going to be certain characteristics of the distributions that are shares across all functions, and I worry that you are wasting a lot of compute power by relearning everything.

anomaly detection, diagnosing software performance regression, performance bug, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Neural Information Processing SystemsJan-26-2025, 01:21:59 GMT

A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions

The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We demonstrate AutoPerf's generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs.

diagnosing software performance regression, regression testing, zero-positive learning approach, (3 more...)

Genre: Research Report > Promising Solution (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Garg, Spandan, Moghaddam, Roshanak Zilouchian, Sundaresan, Neel

RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot

arXiv.org Artificial IntelligenceJun-29-2023

Performance bugs are non-functional bugs that can even manifest in well-tested commercial products. Fixing these performance bugs is an important yet challenging problem. In this work, we address this challenge and present a new approach called Retrieval-Augmented Prompt Generation (RAPGen). Given a code snippet with a performance issue, RAPGen first retrieves a prompt instruction from a pre-constructed knowledge-base of previous performance bug fixes and then generates a prompt using the retrieved instruction. It then uses this prompt on a Large Language Model (such as Codex) in zero-shot to generate a fix. We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing. Our evaluation shows that RAPGen can generate performance improvement suggestions equivalent or better than a developer in ~60% of the cases, getting ~39% of them verbatim, in an expert-verified dataset of past performance changes made by C# developers.

large language model, machine learning, natural language, (22 more...)

2306.17077

Country:

North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceMar-5-2023

Understanding Bugs in Multi-Language Deep Learning Frameworks

Li, Zengyang, Wang, Sicheng, Wang, Wenshuo, Liang, Peng, Mo, Ran, Li, Bing

Deep learning frameworks (DLFs) have been playing an increasingly important role in this intelligence age since they act as a basic infrastructure for an increasingly wide range of AIbased applications. Meanwhile, as multi-programming-language (MPL) software systems, DLFs are inevitably suffering from bugs caused by the use of multiple programming languages (PLs). Hence, it is of paramount significance to understand the bugs (especially the bugs involving multiple PLs, i.e., MPL bugs) of DLFs, which can provide a foundation for preventing, detecting, and resolving bugs in the development of DLFs. To this end, we manually analyzed 1497 bugs in three MPL DLFs, namely MXNet, PyTorch, and TensorFlow. First, we classified bugs in these DLFs into 12 types (e.g., algorithm design bugs and memory bugs) according to their bug labels and characteristics. Second, we further explored the impacts of different bug types on the development of DLFs, and found that deployment bugs and memory bugs negatively impact the development of DLFs in different aspects the most. Third, we found that 28.6%, 31.4%, and 16.0% of bugs in MXNet, PyTorch, and TensorFlow are MPL bugs, respectively; the PL combination of Python and C/C++ is most used in fixing more than 92% MPL bugs in all DLFs. Finally, the code change complexity of MPL bug fixes is significantly greater than that of single-programming-language (SPL) bug fixes in all the three DLFs, while in PyTorch MPL bug fixes have longer open time and greater communication complexity than SPL bug fixes. These results provide insights for bug management in DLFs.

artificial intelligence, dlf, machine learning, (20 more...)

2303.02695

Country: Asia > China > Hubei Province > Wuhan (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-3-2021

Characterizing Performance Bugs in Deep Learning Systems

Cao, Junming, Chen, Bihuan, Sun, Chao, Hu, Longjie, Peng, Xin

Deep learning (DL) has been increasingly applied to a variety of domains. The programming paradigm shift from traditional systems to DL systems poses unique challenges in engineering DL systems. Performance is one of the challenges, and performance bugs(PBs) in DL systems can cause severe consequences such as excessive resource consumption and financial loss. While bugs in DL systems have been extensively investigated, PBs in DL systems have hardly been explored. To bridge this gap, we present the first comprehensive study to characterize symptoms, root causes, and introducing and exposing stages of PBs in DL systems developed in TensorFLow and Keras, with a total of 238 PBs collected from 225 StackOverflow posts. Our findings shed light on the implications on developing high performance DL systems, and detecting and localizing PBs in DL systems. We also build the first benchmark of 56 PBs in DL systems, and assess the capability of existing approaches in tackling them. Moreover, we develop a static checker DeepPerf to detect three types of PBs, and identify 488 new PBs in 130 GitHub projects.62 and 18 of them have been respectively confirmed and fixed by developers.

dl system, proceedings, software engineering, (15 more...)

2112.01771

Country: Asia > China > Shanghai > Shanghai (0.05)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceApr-1-2020, 03:41:36 GMT

Tool Finds Software Update Update Bugs In Hours, Not Days - aster.cloud

It's a common frustration--software updates intended to make our applications run faster inadvertently end up doing just the opposite. These bugs, called performance regressions in the field of computer science, are time-consuming to fix because locating software errors normally requires substantial human intervention. To overcome this obstacle, researchers at Texas A&M University, in collaboration with computer scientists at Intel Labs, developed a completely automated way of identifying the source of the errors. Their algorithm, based on a specialized form of machine learning called deep learning, is not only turnkey, but also quick. It finds performance bugs in a matter of a few hours instead of days.

algorithm, software, software update update bug, (10 more...)

#artificialintelligence

Country: North America > United States > Texas (0.27)

Industry: Information Technology (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.32)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.32)